58 research outputs found

    Proof of concept: concept-based biomedical information retrieval

    Get PDF
    In this thesis we investigate the possibility to integrate domain-specific knowledge into biomedical information retrieval (IR). Recent decades have shown a fast growing interest in biomedical research, reflected by an exponential growth in scientific literature. An important problem for biomedical IR is dealing with the complex and inconsistent terminology encountered in biomedical publications. Dealing with the terminology problem requires domain knowledge stored in terminological resources: controlled indexing vocabularies and thesauri. The integration of this knowledge in modern word-based information retrieval is, however, far from trivial.\ud \ud The first research theme investigates heuristics for obtaining word-based representations from biomedical text for robust word-based retrieval. We investigated the effect of choices in document preprocessing heuristics on retrieval effectiveness. Document preprocessing heuristics such as stop word removal, stemming, and breakpoint identification and normalization were shown to strongly affect retrieval performance.\ud An effective combination of heurisitics was identified to obtain a word-based representation from text for the remainder of this thesis.\ud \ud The second research theme deals with concept-based retrieval. We compared a word-based to a concept-based representation and determined to what extent a manual concept-based representation can be automatically obatined from text. Retrieval based on only concepts was demonstrated to be significantly less effective than word-based retrieval. This deteriorated performance could be explained by errors in the classification process, limitations of the concept vocabularies and limited exhaustiveness of the concept-based document representations. Retrieval based on a combination of word-based and automatically obtained concept-based query representations did significantly improve word-only retrieval. \ud \ud In the third and last research theme we propose a cross-lingual framework for monolingual biomedical IR. In this framework, the integration of a concept-based representation is viewed as a cross-lingual matching problem involving a word-based and concept-based representation language. This framework gives us the opportunity to adopt a large set of established cross-lingual information retrieval methods and techniques for this domain. Experiments with basic term-to-term translation models demonstrate that this approach can significantly improve word-based retrieval.\ud \ud Directions for future work are using these concepts for communication between user and retrieval system, extending upon the translation models and extending CLIR-enhanced concept-based retrieval outside the biomedical domain

    Biomedical cross-language information retrieval

    Get PDF

    Enhancing Access To Classic Childrenā€™s Literature

    Get PDF
    Project Gutenberg is a digital library that contains mostly public domain books, including a large number of works that belong to childrenā€™s literature. Many of these classic books are offered in a text-only format, which does not make them appealing for children to read. Moreover, stories that were written for children one hundred or more years ago, might not be readily understandable by children today due to diverging vocabularies and experiences. In this poster, we describe ongoing work to enhance the access to this childrenā€™s literature repository. Firstly, we attempt to automatically illustrate the childrenā€™s literature. Secondly, we link the text to background information to increase understanding and ease of reading. The overall motivation of this work is to make such publicly available books more easily accessible to children by making them more entertaining and engaging

    University of Twente at GeoCLEF 2006: Geofiltered Document Retrieval

    Get PDF

    Concept based document retrieval for genomics literature

    Get PDF

    Cross Language Information Retrieval for Biomedical Literature

    Get PDF

    Learning to extract folktale keywords

    Get PDF

    Audience and the Use of Minority Languages on Twitter

    Get PDF
    On Twitter, many users tweet in more than one language. In this study, we examine the use of two Dutch minority languages. Users can engage with different audiences and by analyzing different types of tweets, we find that characteristics of the audience influence whether a minority language is used. Furthermore, while most tweets are written in Dutch, in conversations users often switch to the minority language
    • ā€¦
    corecore